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Sequential  design  of  experiments  refers  to  problems  of  inference 
characterized  by  the  fact  that  as  data  accumulate,  the  experimenter  can 
choose  whether  or  not  to  experiment  further.  If  he  decides  to  experiment 
further,  he  can  decide  which  experiment  to  carry  out  next  and  if  he  decides 
to  stop  experimentation,  he  must  decide  what  terminal  decision  to  make. 

The  literature  contains  two  broad  types  of  general  approach  and 
several  major  classes  of  applications.  One  general  approach  is  that  of 
stochastic  approximation.  Three  variations  are  the  Robbins -Monro  methods, 
Box -Wilson  response  surface  methods  and  the  up-and-down  methods.  The 
other  general  approach  consists  of  finding  optimal,  or  asymptotically 
optimal  designs,  generally  in.a  Bayesian  decision  theoretic  context. 

Special  classes  of  applications  include  survey  sampling,  multilevel 
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APPROACHES  IN 

SEQUENTIAL  DESIGN  OF  EXPERIMENTS 

by 

Herman  Chernoff 

1.  Introduction 

Sequential  design  of  experiments  refers  to  problems  of  inference 
characterized  by  the  fact  that  as  data  accumulate,,  the  experimenter 
can  choose  whether  or  not  to  experiment  further.  If  he  decides  to 
experiment  further,  he  can  decide  which  experiment  to  carry  out  next, 
and  if  he  decides  to  stop  experimentation,  he  must  decide  what  terminal 
decision  to  make. 

In  principle,  ordinary  sequential  analysis,  where  there  is  no 
choice  of  experiment  but  where  one  must  simply  decide  whether  or  not 
to  repeat  a  specified  experiment,  is  a  special  if  slightly  degenerate 
case  of  sequential  design.  The  same  can  be  said  for  double  sampling, 
where  the  experimental  choice  reduces  to  selecting  the  size  of  the  first 
sample  and,  given  the  outcome,  the  size  of  the  second  sample.  Indeed 
double  sampling  may  be  regarded  as  the  origin  of  sequential  analysis 
and  hence  of  sequential  design  of  experiments.  With  the  exception  of 
a  few  references  of  special  interest,  we  shall  avoid  the  discussion 
of  these  degenerate  cases,  and  we  shall  concentrate  mainly  on  problem 
areas  and  theories  where  there  is  a  choice  of  experimentation  after 
each  observation.  We  shall  do  this  in  our  search  for  general  insights 
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even  though  double  sampling  is  probably  of  more  practical  interest 
than  the  remainder  of  sequential  experimentation. 

In  recognition  of  the  importance  of  a  theory  of  sequential  design, 
Robbins  [  Rl]  proposed  the  two-armed  bandit  problem  as  a  prototype 
problem  of  possibly  fundamental  importance.  Two  variations  of  the 
sinples*  version  are  the  follo/dng.  In  both  there  exist  probabilities 
p^  and  Pp  corresponding  to  the  probability  of  success  with  two  arms. 
Selecting  an  observation  from  arm  i  leads  to  a  success  with  proba¬ 
bility  p^,  2  :1.2  .  The  two  alternative  hypotheses  are  H^:  (Pi^P2)  - 

(P10'P20)  811(1  K2:  ,p2^  =  l"P20'P10*  WhSre  P10  and  P20  are 
distinct  specified  probabilities .  Thus  one  knows  both  probabilities, 

but  one  doesn’t  know  which  corresponds  to  which  arm.  Each  hypothesis 

is  assumed  to  be  equally  likely.  After  each  observation  the  experimenter 

may  select  tne  arm  tc  be  used  nex*  until  N  observations  have  been 

taken.  In  one  variation  the  object  is  tc  make  the  choices  so  as 

to  maximize  the  probability  of  deciding  which  hypothesis  is  true  after 

the  N'1  observation-  Ir  the  other  variation  the  object  is  to  maximize 

the  expected  total  rrimcer  of  successes  in  N  trials.  The  second  version 


is  the  one  usually  referred  u  as  the  two-armed  bandit  problem  and 
seems  to  confront  the  major  issue  more  directly.  How  does  one  compro¬ 
mise  between  the  anticipated  cost  and  the  value  of  the  information? 

For  in  that  problem  tne  choice  of  the  arm  less  likely  (according  to 
the  posterior  probability)  to  have  the  larger  probability  would  consti¬ 
tute  a  sacrifice  -^f  immediate  gain  an  the  hope  of  information  which  could 


lead  to  ultimate  profit. 


As  a  prototype  this  problem  was  a1 tacked  vigorously,  but  the 
results  implied  that  this  problem  failed  as  s  useful  prototype,  at 
least  in  its  immediate  interpretation.  The  main  result,  which  was 
surprisingly  difficult  to  establish  [F5  1 ,  always  calls  fcr  the  use 
of  the  arm  mcst  likely  to  have  the  higher  probaci.lit ;  and  hence  does 
not  yield  a  useful  comparison  of  cost  with  information.  The  varia¬ 
tions  of  this  problem  where  this  result  does  not  apply  did  not 
seem  to  have  any  clearly  generaiizable  interpretation.  These  varia- 
tions  involve  imposing  different  prior  distributions  on  (p^,p  )  . 
Note  that  the  original  problem  corresponds  to  a  two -point  prior  dis¬ 
tribution  with  probability  allocated  to  the  two  points  (Pi0>P20) 

^P20'P1C'  ' 

A  problem  which  is  currently  of  considerable  interest  in  pattern 
recognition  problems  is  fundaoe: rally  related  to  sequential  design  of 
experiments,  although  strictly  speaking  there  may  be  no  novel  experi¬ 
mentation.  Here  the  question  becomes  one  of  v-.ich  functions  of  the 
already  collected  data  should  be  studied.  Fcr  exampie,  one  may  have 
samples  of  cardiograms  fcr  normal  people  and  for  people  having  had 
heart  attacks.  One  may  wish  to  develop  a  method  of  ciatsifying  a 
given  cardiogram  into  one  of  these  twc  categories.  What  aspect  of 
the  cardiogram  should  one  study?  One  may  select  first  some  simple 
function  of  the  data  (called  a  meature  ir.  the  pattern  recognition 
literature).  Tc  the  extent  that  the  use  of  this  feat -Are  can  only 


do  part  of  the  job  of  classifying*  one  may  attempt  to  look  for  addi¬ 
tional  features  sequentially.  Although  the  data  are  completely  avail¬ 
able,  the  process  of  selecting  new  features  is  equivalent  to  the 
carrying  out  of  additional  experiments*  as  is  practiced  by  the  physician 
who  diagnoses  an  illness  by  a  succession  of  "tests".  Both  of  these 
cases  have  one  aspect  in  common  which  separates  them  from  the  main 
body  of  the  literature  on  sequential  design  of  experiments.  In  both 
of  these  the  result  of  the  n  "experiment"  is  statistically  dependent 
on  the  previous  results.  However*  most  of  the  literature  in  sequential 
design  of  experiments  concentrates  on  problems  where  once  the 
experiment  is  selected,  its  outcome  is  independent  of  the  past.  Indeed 
an  experiment  can  be  repeated  (independently)  several  times  in  such 
problems,  whereas  a  repetition  is  useless  (except  to  correct  for  exper¬ 
imental  error)  in  the  cardiogram  and  diagnosis -type  problems. 

The  literature  in  sequential  design  contains  two  broad  types  of 
general  approach  and  several  major  classes  of  applications.  One  type 
of  general  approach  is  that  of  stochastic  approximation.  Three  varia¬ 
tions  are  the  Robbins -Monro  methods,  the  Box -Wilson  response  surface 
methods*  and  the  up-and-down  methods.  These  variations  apply  to  the 
estimation  of  characteristics  of  a  regression  function  and  use  the  data 
to  determine  the  next  level  of  the  independent  variable  at  which  to 
measure  the  dependent  variable.  Typically  no  attention  is  paid  to 
a  stopping  rule.  The  other  general  approach  consists  of  finding 
optimal  or  asymptotically  optimal  designs*  generally  in  a  Bayesian 
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decision  theoretic  context. 

Special  classes  of  applications,  about  some  of  which  little  will 
be  said  here,  are  (1)  survey  sampling,  (2)  multilevel  continuous 
sampling  inspection,  (3)  selecting  the  largest  of  k  populations,  (4) 
screening  experiments,  (5)  group  testing,  and  ! 6)  search  problems. 
While  one  would  expect.  Monte  Carlo  sampling  tc  b?  one  of  these  classes, 
the  literature  seems  to  lack  interest  in  the  sequential  selection 
of  simulation  experiments.  There  are  a  few  miscellaneous  categories 
such  as  "forcing  experiments  to  be  balanced"  ana  some  process  control 
problems  which  also  deserve  mention. 

This  paper  consists  of  two  major  parts.  One  is  devoted  to  the 
more  general  approaches,  the  other  to  the  classes  of  applications. 


2.  Stochastic  Approximation 

The  Rohbins-Mcnro  [RU]  method  applies  to  the  following  problem. 
Corresponding  to  a  choice  x  of  the  "independent  variable",  one 
observes  the  dependent  variable  Y(x)  with  nor. -decreasing  expectation 
M(x)  =  E[Y(x)j  •  It  is  desired  to  estimate  9,  tba+  vslue  of  x 
for  which  M(x)  =  a  for  seme  specified  value  a  ■  Starting  with  an 


initial  guess  x^,  successive  choices  Xg,Xy  ...  are  made  according 
to 


'  MW  'a 


1 
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for  some  specified  sequence  (an)  .  The  sequence  (x  )  serves  both 
as  the  successive  estimates  of  B  and  as  the  experimental  levels  of 
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x  .  Since  Y  (x  )  -  a  tends  to  reflect  how  far  x  is  from  6, 
n  n  n 

the  above  iteration  represents  a  correction  for  overestimates  or 

underestimates.  The  (an)  sequence  represents  the  extent  of  the 

correction.  If  the  were  bounded  away  from  aero,  the  successive 

terms  would  tend  to  fluctuate  by  an  amount  determined  in  part  by  the 

variance  of  Y(x  )  .  If  the  a„  -» 0  too  rapidly,  the  corrections 
n  n 

might  not  build  up  fast  enough  to  correct  for  an  initial  error.  How¬ 
ever,  if  an  -*  0  at  a  suitable  rate,  it  is  possible  to  show  that 
xr  -+  x  with  probability  one  under  weak  assumptions  concerning  the 
distribution  of  Y(x)  .  There  is  an  extensive  literature  to  this 
effect  which  indicates  that  the  method  requires  little  but  that 
M(x)  >  a  for  x  >  6  and  M(x)  <  a  for  x  <  6  . 

While  very  little  is  required  of  the  sequence  t&nl>  what  does 
seem  remarkable  is  that  with  a  proper  choice  of  (an)  this  method, 
which  confuses  design  level  with  estimate  and  which  ignores  the  past 
except  for  the  la6t  estimate  and  the  number  of  observations,  is  asymp¬ 
totically  efficient.  Hodges  and  Lehmann  [H6]  have  shown  that  if 

2 

Y(x)  has  mean  M(x)  =  3x  +  5  and  constant  variance  cr  ,  and 
an  --  c/n,  then  9  =  3~i(a-5)  and 

E<Vi-9>2  *  Htf^rr  • 

It  follows  that  if  c  =  f  ,  this  method  has  asymptotic  efficiency 
one  for  estimating  9  in  the  normal  linear  regression  problem  where 
the  slope  3  is  known  but  the  y-intercept  6  is  not  known. 
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Some  reflection  based  on  the  following  facts  will  help  explain 
this  result.  If  the  regression  is  linear  ar.d  «  is  known,  the  effi- 

A 

ciency  of  the  conventional  estimate  6  or  C  is  independent  of  the 

A  —  —  2 

design.  Indeed  t  =  Y  -  Sx„  has  variance  n  o  ,  and  the  corres- 
n  n  n 

ponding  estimate  of  9,  =  3  [ce-(  )  3  has  variance  n  &  ‘~o 

A  A  A  I 

Moreover,  if  x  is  selected  to  be  B  ,  then  B  ,  *  S  -  [Y  -a]  . 

n  n  n+l  n  r»3  n 

Finally  in  the  case  where  3  is  no4  known,  the  asymptotic  variance 

-2  2  *2  —  ? 

of  the  conventional  estimate  of  6  is  3  o  fl+s  *"(x  *9)'}  where 
n  n  n 

2  -1  —  p 

c  is  n  I  (x.  -x  )  .  Thus  the  results  of  the  known  3  case  can 

n  i,l  1  n 

be  approximated  as  long  as  x  -0  is  small  compared  tc  s  .  In  the 

n  n 

stochastic  approximation  case  U3ing  the  sequence  a  =  c/n,  there  is 

n 

no  prior  knowledge  of  6  to  Insure  that  c  -  3""  .  However,  as 

data  accumulate  one  would  hopefully  obtain  a  satisfactory  estimate  of 

3  providing  the  successive  x^  are  not  tec  close  to  each  other.  This 

proviso  was  achieved  by  Venter  (Vl]  and  Fabian  {  FI , 5  ]  by  the  expedient 

of  separating  the  design  and  estimation  functions  of  .  That-  is, 

¥* 

they  use  z  as  an  estimate  of  0  and  select  two  levels  z  *  c 
n  n  n 

and  z  -  c  at  which  tc  draw  successive  observations  from  which  an 
n  n 

estimate  of  M’(9)  is  derived  as  well  as  an  estimate  of  G  . 

These  revised  versions  of  the  Robbms-Monro  method  have  some  cf 
the  robustness  property  of  the  original  methoa.  Furthermore,  with 


regularity  conditions  under  which  M(x)  is  locally  linear  (and  smooth) 

with  slope  3  at  x-0,  ^(x^-©)  is  asymptotically  normal  with 

o .  2 

mean  0  and  variance  cr/n3  where  n  is  the  number  of  observations. 


This  is  the  best  one  could  hope  for  in  the  case  of  normal  linear 
regression. 

The  suggestion  for  separating  the  design  and  estimation  functions 

of  x  was  iaplicit  in  the  earlier  generalization  of  the  Robbins-Monro 
n 

method  by  Kiefer  amd  Wolfcwitz  [12]  to  the  problem  of  locating  the 

value  0  of  x  at  which  M(x)  achieves  a  maximum.  Just  as 

-sgn[Y(x  ) -or 3  estimates  sgn( 2-x^)  and  points  in  the  direction  of 

8  from  xr  in  the  R-M  problem,  so  dees  sgn  M‘(xn)  point  in  the 

direction  of  0  from  xr  in  the  K-W  method.  Here  [M*(zn)]  is 

estimated  by  [Y(  z  +c  )  -Y(z  -c  )]/2c  and  the  K-W  method  uses 
n  n  on  n 


2  ■?.  z  + 

n+1  n 


‘n[l!Vcn>-,r(Vcn)J 


where  a  ,  c  -*  C  so  that  Z&  =  »,  Ea  c  <  »,  Ea^c  d  <  »  (e.g., 
nr.  n  nn  nn 

-1  -1/3  >. 

an  =  n  ,  cn  r  n  •  )  . 

Venter  fv2]  and  Fabian  [FI., 3]  have  also  generalized  the  K-W  scheme 
to  obtain  procedures  which  converge  in  general  but  which  are  asymp¬ 
totically  optimal  if  the  local  behavior  of  M(x)  at  6  is  smooth. 
This  work  has  been  extended  to  several  dimensions.  Relatively  little 
attention  in  the  literature  has  been  paid  to  stopping  rules. 

The  price  paid  for  the  robustness  of  these  methods  is  that  their 
behavior  depends  mainly  on  the  nature  of  M(x)  for  x  close  to  8 
and  do  not  take  advantage  of  extra  knowledge.  Thus  in  problems  vhere 
Y(x)  depends  in  a  known  way  upon  several  unknown  parameters,  it  could 


be  possible  to  develop  mo re  efficient  if  less  robust  sequential  estima- 
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tion  techniques.  In  particular,  for  estimation  the  LI).  01  of  the 
Probit  model,,  the  efficiency  of  the  "best**  Robbins -Monro  method 
relative  to  a  locally  optimal  design  is  only  6U$. 

A  parallel  development  to  the  Robbins -Monro,  Kiefer-Wolfowihz 
methods  was  the  stochastic  approximation  methods  of  Box -Wilson  [  Bll.’, 
which  gave  rise  to  a  literature  using  the  terms  "response  surface" 
and  "steepest  ascent"  and  "rotatable  designs".  Principally  designed 
for  multivariable  applications,  one  observes  Y(x)  fc.r  a  set  of 
points  x  ink-dimensional  space.  Approximating  EY(x)  by  a  plane 
surface,  one  estimates  the  direction  of  steepest  ascent  (gradient) 
and  moves  in  that  direction.  Alternatively,  one  can  approximate 
EY(x)  by  a  quadratic  surface  and  estimate  the  point  af  which  the 
quadratic  is  maximized.  At  each  stage  the  estimated  parameters  are 
used  not  only  to  estimate  the  location  ■'f  fte  maximum  but  to  suggest 
another  set  of  values  of  x  at  which  *-c  take  additional  observations. 
Rotatable  designs  are  a  special  class  of  designs  used  around  the 
point  of  interest  (B5»61.  The  general  approach  is  rather  pragmatic 
and  informal  compared  with  the  methods  proposed  by  Rcbbms  and  Monro, 
Kiefer  and  Wolfowitz,  and  Fabian  and  hence  are  less  amenable-  to  sys¬ 
tematic  analysis  and  evaluation.  On  the  ether  hand,  as  these  more 
formal  methods  developed  they  tended  t.o  resembie  Bex- Wilson  approach 

more  and  more. 

A  variation  of  the  Box -Wilson  approach  uses  Parian,  a  method 
developed  by  Shah,  Buehler  and  Kempthcrne  ( S2  ] .  It  replaces  the 
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gradient,  or  steepest  ascent  approach  by  a  more  sophisticated  varia¬ 
tion  which  combines  two  successive  gradients  in  a  method  which  is 
successful  in  speeding  up  convergence  for  deterministic  problems  and 
is  apparently  effective  in  the  stochastic  problems  dealt  with  here. 

A  review  of  the  literature  on  response  surface  methodology  was  given 
by  Hill  and  Hunter  [H5]* 

A  somewhat  more  specialized  method  of  stochastic  approximation 
applied  in  quantal  response  problems  is  that  of  the  up-and-down  method, 
introduced  by  Dixon  and  Mood  [ D3  ]  •  It  is  desired  to  estimate  the 
dose  x  for  which  the  probability  of  response  assumes  a  certain  speci¬ 
fied  value  a  .  The  possible  dose  levels  of  the  experiment  are  equally 
spaced  (possibly  in  a  logarithmic  scale).  If  a  dose  at  level  x 
leads  to  a  response,  the  next  dose  applied  is  one  step  down  and  if 
it  does  not  lead  to  response,  the  next  dose  applied  is  one  step  up. 

When  the  investigator  terminates  sampling,  he  estimates  the  parameters 
of  the  model  by  some  method  such  as  maximum-likelihood.  A  considerable 
number  of  variations  of  the  basic  approach  have  developed.  See 
[C7,  Dl,  W2 ] .  For  quantal  response  problems,  this  approach  has  a 
potential  advantage  over  that  of  the  Robbins -Monro  method  in  that  the 
associated  estimation  procedure  makes  use  of  the  specific  model  applied. 
In  doing  so  it  of  course  loses  the  all-purpose  robustness  properties 
of  the  Robbins-Monro  method. 

5 .  Optimization  Approaches 

In  principle  the  problems  of  sequential  design  of  experiments  can, 
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by  assuming  a  priori  probability  distributions  and  cost  functions,  be 


reduced  to  optimization  problems  which  can  be  solved  by  backward  induc¬ 
tion.  This  idea  has  been  exploited  by  Whittle  [W2],  who  used  it  to 
set  tip  a  functional  equation  in  terms  of  posterior  probability  distribu¬ 
tions.  However,  the  approach  has  been  effective  on  very  few  rather 
simple  problems.  The  insight  provided  by  tHs  statemerf  ras  limited 
value  in  most  statistical  problems. 

It  is  not  uncommon  for  investigators  to  use  a  myopic  version  of 
backward  induction.  Here  the  experimenter  asks,  after  the  outcome  of 
each  trial,  "If  I  have  at  most  one  more  experiment  to  perform,  which 
if  any  will  I  perform?"  In  many  cases  this  method  seems  to  yield  satis¬ 


factory  results.  I  say  'seems"  because  one  seldom  compares  it  with 
optimal  procedures.  One  case  where  it  has  been  used  is  in  medical,  diag¬ 
nosis  [G2].  In  principle  this  idea  Is  also  used  in  stepwise  regression 
techniques  for  building  up  a  good  set  predictor  variables. 

It  will  be  informative  to  see  hew  tr.is  myopic  policy  wonts  i;i  a 
completely  different  context.  Tc  maximize  a  function  f(x),  x  e  E  , 

4. 

by  the  gradient  method,  one  adjusts  the  a"  estimate  by 

x  -  -  x  +  b  ™  (x  ) 
n+1  n  ox  *  a 

where  bf/dx  represents  the  gradient,  or  /ectcr  of  partial  derivatives 
with  respect  to  the  components  of  x  .  This  method  does  net,  specify 
the  value  of  the  scalar  of  h  .  A  special  version  called  the  optimal 


gradient  method  selects  h  to  be  that  value  for  which  ^xn  f  h  ( xn ) ) 
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assumes  it*  maxima.  Thia  can  be  regarded  a*  a  myopic  aequential 
optimization  procedure.  Applying  it  to  the  function  f(x)  «  -(x^+cXg), 
the  gradient  i*  ( -Zx^,  Scx^ )  and  an  initial  approximation 
x  ■  (x1>3tg)  to  the  point  (0,0)  which  maximise*  f  ia  followed  by 
x*  ■  x  «■  h  df/dx,  where  h  *  ^'UV^J/CcV"2]*  x£  ■  x1(c-l)/(c+u"2), 
x*  «  -Xg(c-l)/(l-*-cn  ),  and  u  »  .  The  value  of  f  ia  reduced 

by  a  factor  f*/t  *  (c-lJ^/lc+u^Hc+u2]  •  Since  u*  «  c xg/x£  »  -u”1, 
this  factor  dcea  not  change  in  aucceaaive  iteration*  even  though  h 
w  alternate*  between  the  above  value  and  h*  »  ]/  [c+u  1  •  On  the 

other  hand 

f*(x)  -  -Ux^l-ah)]2  +  IXg(l-ach)]2) 

could  be  much  more  rapidly  reduced  by  alternating  h  between  l/2  and 
l/2c  . 

For  this  particular  function,  if  we  assume  no  round-off  error, 
alternating  h  between  1/2  and  l/2o  accomplishes  the  maximization 
in  two  steps.  In  general,  when  f  represents  only  the  main  term  in 
the  expansion  of  the  function  to  be  maximised,  and  there  are  round-off 
errors,  two  iterations  will  not  suffice  to  reach  the  maximising  point. 
The  above  example  illustrates  that  the  rate  of  convergence  can  be  faster 
than  for  the  ayopic  policy  called  the  "optimal  gradient  method"  if  the 
values  of  h  are  chosen  with  due  attention  to  the  characteristic  roots 
of  the  quadratic  form  approximating  the  function  to  be  maximised. 

Two  slightly  less  myopic  policies  which  are  probably  more  effective 
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and  correspondingly  sort  difficult  to  execute  ere  the  foil  owing: 

(1)  Look  two  stops  ahead  into  the  future .  This  Involves  the  mathematics 

of  a  two-step  backward  induction  at  each  stage,  (£)  At  each  step 

ask  whether  there  is  an  experlaent  e  and  a  number  of  repetitions  a 

so  that  the  statistician  would  prefer  a  independent  repetitions 

of  e  to  any  other  (e*,m*)  and  to  stopping.  if  so,  select  e  for 

the  next  trial.  Apparently  until  recently  thie  latter  approach  has 

bees  used  only  to  determine  reasonable  stepping  rules  in  problem*  #itbv  .4  ')-.  :v 

no  choice  of  experimentation  tU>  ©3*  Recently  Gittdne  and  Jo are  [Ol  ] • 

have  used  a  variation  of  this  idea  effectively  to  gain  nw  insight  . 

in  the  tvo-araed  bandit  problea  by  evaluating  a  choice  in  terns  of 

bov  good  it  would  be  if  we  had  to  use  that  choice  thereafter. 

li.  Asymptotically  Qptiaal  Procedures  in  Testing  jfrpctfceses 

Large  sample  theory  provides  useful  insight  in  statistical  problems 
for  two  reasons,  first ,  the  derivation  and  simple  expression  of  approp¬ 
riate  distributions  are  eeaiest  for  sample  sizes  cf  X,  2,  and  «  , 

Second,  as  sample  site  becomes  large,  many  different  philosophical 
approaches  lead  to  results  which  are  similar , a-.d  vhi>  uniformly 
best  procedures  are  generally  nonexistent  for  finite  sample  site, 
asymptotically  optimal  procedures  do  exist.  It  was  h^ped  that  large 
sample  theory  would  provide  insights  which  might,  permit  cn?  he  bypass 
the  need  for  backward  induction.  As  we  shall  see  lat?r,  th^s  is  rela¬ 
tively  trivial  ir.  estimation  problems  where  locally  optimal  designs 
yield  relatively  efficient  procedures  easily. 
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In  testing  problems,  the  situation  seems  more  difficult.  But 
even  here  simple  asymptotic  results  yield  useful  insights.  In  sequen¬ 
tially  testing  of  a  simple  hypothesis  H^t  9  =  9^  versus  a  simple 
alternative  1^: 9  =  9^  where  the  successive  observations 
X-^Xg, . .  .,XR, .  .#  are  i.i.d.  with  density  f(x,S)  the  admissible 

procedures  are  the  sequential  probability-ratio  test  with  limits  A 

n 

and  B,  B  <  1  <  A,  on  the  likelihood-ratio  =  H  [ f ( X±  !  6^/ f*( X±  |0O)] 
In  a  Bayesian  framework  with  initial  prior  probabilities  and  = 

1  -  a  cost  of  sampling  c,  and  regrets  for  deciding  wrong  r^  = 
r(  9. )  >  0,  i  =  1,2  the  Bayes  procedure  is  determined  by  appropriate 
limitt^  A(*1,r1,r2,c)  and  3(l1,r^,rg,c)  .  As  the  cost  of  sampling 
c  0,  the  appropriate  sample  size  -»  »  and  this  is  derived  from  the 
fact  that  log  A  «  and  log  B  .  In  fact,  log  A  »  log  B  -log  c 
the  posterior  risk  upon  stopping  as  well  as  the  posterior  probability  of 
being  wrong  is  of  the  order  cf  magnitude  of  c  and  the  expected  sample 
size  is  given  by 

Vn)*iTvg  Vn)* 

where  l(  0,<p)  =  E0{logff(X,  e)/f(X,<p)  3)  =  /  log[f(x,  0)/f(x,<p)  jf(x,  0)dx 
is  the  Kullbaek-Lei'bler  information  number.  Indeed  the  main  contribu¬ 
tion  to  the  risk  or  expected  loss  is  the  cost  of  sampling,  and  this  is 
given  by 


Ik 


In  effect,  the  importance  of  the  Kullback-Leibier  information  number 
derives  from  the  fact  that  l( 9^, 9^ )  measures  hew  fast  the  posterior 
probability  for  0g  approaches  zero  when  9^  is  the  true  state  of 
nature . 

This  single  result  for  sequentially  testing  sample  hypotheses  where 
there  is  nc  choice  of  expej imentation  suggests  that  if  one  bad  a  choice 
;f  experiments  to  perform  at  each  stage,  the  appropriate  choice  \  c’xld 
depend  on  1(9^,  ©_^e)  Kullback-Leibler  number  corresponding  to 
data  from  experiment  e  .  Indeed  if  ^(9i>S2'el^  >  ^  40(3 

I(  ®2>  >  3t  seems  dlear  that  ei  is  preferable  to 

eg  .  But  if  the  last  inequality  is  reversed,  then  e^  is  preferable 
to  e2  only  if  is  true.  The  obvious  implication  is  that  if 
the  data  strongly  suggests  H-^  is  true,  one  should  select  the  next 
experiment  to  maximize  l(9^,G^,e)  provided  the  evidence  is  not  so 
overwhelming  that  it  pays  to  stop  sampling. 

Suppose  now  that  we  move  to  the  more  complex  pro Diem  which  involves 
composite  hypotheses  with  a  fixed  experiment.  Toe  simplest  case  is 


where 


H.. :  6  =  9  and  H  ;  0  =  90  or  9,  .  Suppose  9 ,  ,9^,6,  start 

J.  J.  *-  J  XC-* 


out  with  initial  prior  probabilities  |2Q,  •  After  n 

observations  the  posterior  probabilities  are  $2n’  and 


assuming  is  true, 


52n~e 


-nl(81,  e2)  -nl{SyS  ) 

’  s3n  ~  e 


Thus  the  rate  at  which  the  posterior  probability  of  approaches 
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zero  is  determined  by  the  minimum  of  1(0^,  9g)  and  1(0^,  ®j)  *  T*1*8 

observation  leads  to  the  following  suggested;  procedure  for  the  more 
general  problem  of  testing  the  composite  hypotheses  H^:  0  e  vs . 

H,:  9ca^  when  there  is  a  choice  of  experiments.  Stop  sampling  after 
the  nth  observation  if  the  posterior  probability  of  one  of  the  hypotheses 
is  of  the  order  of  magnitude  of  c  (or  if  the  posterior  risk  of 
stopping  and  making  a  terminal  decision  is  of  this  order).  Otherwise 
select  the  next  experiment  e  to  maximize 


mf  l(0,q>,e) 

(0n)  n 

/\ 

where  9  is  the  maximum  likelihood  estimate  of  9  and  a(0),  the 
alternate  hypothesis  to  9,  is  defined  by 

a(  9)  =  if  fl/1  . 

Vi 

It  should  be  noted  tha’  e  is  selected  from  among  the  class  of  random¬ 
ized  experiments,  and  it  has  been  assumed  that  each  of  these  experiments 
has  the  same  low  cost  c  .  If  the  cost-  per  experiment  varies,  then 
one  deals  with  information  per  unit  cost  rather  than  information. 

The  method  suggesred  above  was  shown  to  be  asymptotically  optimal 
under  mild  conditions  (Cl.C^Jas  c  -»  0  in  the  sense  that  for  each  9 
it  yields  a  risk 


F(  9)  * 


-c  log  c 


where 


sup  inf  l(0,cp,e) 

e  e  £ *  cp  e  a(  0) 


euid  Z  *  is  the  class  of  randomized  experiments  derived  from  the 
class  £  of  available  or  "element ary”  experiments.  Moreover  fbr 
any  alternative  procedure  to  do  better  for  some  value  of  9,  it 
must  do  worse  by  an  order  of  magnitude  for  some  other  value  of  9  . 

This  result  was  first  proved  for  the  case  where  and  2 

were  finite.  Bessler  [B2]  extended  the  result  to  the  case  where  2 
is  infinite  ani  the  problem  of  choosing  between  two  hypotheses  cobld 
be  replaced  by  a  choice  among  k  actions.  Albert  tAl J  extended  this 
result  further  to  the  case  where  the  hypothesis  spaces  c^,  <r  may 
be  infinite  sets. 

Here  a  fundamental  difficulty  appeared.  In*  such  a  simple  problem 
as  testing  whether  the  probability  of  response  to  one  drug  is  greater 
than  for  another  drug,  the  two  hypothesis  spaces  are  adjacent  to  one 
another  and  1(9)  vanishes  on  the  boundary.  Then  the  asymptotic 
optimality  breaks  down.  Heuristics  indicated  that  the  difficulty 
arises  mere  from  the  stopping  rule  than  the  experimental  design  aspect 
of  the  problem,  and  G.  Schwarz  [si  3  attacked  that' problem  by  studying 
optimal  sequential  procedures  for  testing  that  the  mean  u  of  a  nor¬ 
mal  distribution  is  versus  the  alternative  that  it  is  when 

it  is  possible  that  u  could  be  tp  or  0  .  In  the  latter  case  it 
doesn't  matter  what  terminal  decision  is  made.  His  results  extended 
tc  asymptotically  optimal  and  Bayes  results  for  testing  that  the  mean 
exceeds  versus  the  alternative  that  it  is  less  than  yi^, 

(ii2  <  when  it  is  possible  that  <  u  <  in  which  case  either 
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decision  is  equally  satisfactory.  In  other  words.,  this  is  the  case 
of  an  Indifference  zone.  Here  asymptotically  Bayes  procedures  consist 
of  stopping  when  the  posterior  risk  of  stopping  and  making  a  terminal 
decision  is  0(c)  and  yield  overall  risks  of  order  O(-clogc)  . 

Finally  Kiefer  and  Sacks  [KlJ  combined  these  results  to  obtain  an 
asymptotically  optimal  procedure,  for  problems  in  sequential  design  where 
the  parameter  points  for  which  various  actions  are  preferred  are  separated 
by  indifference  zones .  In  these  results  the  key  information  number  is 
expressed  oy 

1(9)  «  sup  sup  inf  l(9,<p,e) 
ejJ"  i  €  Qq 

where  oj.  is  the  set  of  9's  on  which  the  ith  action  is  optimal,  and 

A.  y. 

is  the  set  of  i  for  which  the  i  action  is  optimal  when  9  is 
the  true  state  of  nature.  (In  the  two  action  problems,  Gg  =  (1,2) 
for  9  in  the  indifference  zone.)  The  appropriate  experiment  is  the 
randomized  experiment  ee£*  which  yields  1(9)  as  the  supremum  in 
the  above  expression  and 


R(  9) 


Both  the  proof  and  the  method  are  simplified  considerably  in  the 
Kiefer  and  Sacks  paper  where  a  two-stage  sampling  procedure  is  used. 

An  initial  large  sample  of  size  o(-xogc)  is  followed  by  an  estimate 
of  9  and  a  second  sample  of  appropriate  size  on  an  appropriate  choice 
of  e  . 
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In  principle  this  approach  is  extremely  successful  in  bypassing 
the  need  for  backward  induction.  Asymptotically  optimal  results  are 
obtained  with  recourse  only  to  Kullback-Leibler  information  numbers 
and  likelihood-ratio  statistics.  However,  there  are  several  short¬ 
comings.  First,,  the  role  of  indifference  zones  implies  that  the  simple 
problem  cf  deciding  whether  the  mean  u  of  a  normal  distribution  is 
positive  or  negative  with  a  positive  loss  such  as  juj  attached  to 
the  wrong  decision  is  not  covered.  Second,  the  approach  is  very  coarse 
for  moderate  sample  size  problems.  Indeed  the  Kiefer-Sacks  twc-stage 
variation  sidesteps  the  issue  of  how  to  experiment  in  the  early  stages 
whereas  the  original  Chernoi'f  approach  simply  treats  the  estimate  cf 
9  based  on  a  few  observations  with  as  much  respect  as  that  based  on 
many  observations. 

On  top  of  these  shortcomings  the  asymptotic  analysis  distinguishes 
sharply  between  terms  of  order  c?  magnitude  of  c  and  of  clcgc, 
whereas  the  difference  in  most  applied  examples  may  b?  less  than  over¬ 
whelming.  (A  proper  analysis  should  pay  mere  attention  to  the  fact 
that  log  c  is  dimensionally  wrong.  The  quantity  c  should  be  nor¬ 
malized  approximately  with  respect  to  the  costs  of  making  the  wrong 
decision.  This  normalization  occurs  naturally  if  one  stops  when  the 
posterior  risk  of  stopping  is  of  the  order  of  the  cost  of  stopping. 

In  addition  to  this  approach,  alternative  procedures  have  been 
proposed  by  Lindley  [12  ],  BeGroot  [E2  j,  and  Box  and  Hill  [B?  J  .  For 
example  Bindley  suggested  measuring  the  value  of  an  experiment  in 
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terms  of  the  Shannon  Information  or  Entropy.  One  may  select  at  each 

stage  the  experiment  for  which  the  expected  redaction  in  entropy  is  a 

maximum.  To  he  more  specific,  if  6  6^  are  the  possible  states 

of  nature  among  which  one  must  decide  and  4^  is  the  prior  probability 

of  0i,  the  entropy  is  -E^log^  -  After  an  experiment  yielding 

X  the  prior  prooabi titles  4.  are  replaced  by  4*  proportional  to 
e  1  * 

*^f(X  jS^e)  and  the  reduction  in  entropy  is 

TAli  log  |1  -  4|  log 4|  ] 
whose  expectation  may  he  coroputea  to  be 

SfjKes*  ,e) 

where  9*  corresponds  to  an  ideal  distrib’ttion  with  density  £4if(xe,  9i#e) 
Box  and  Hill  sf  art  ed  with  the  sank"  approach,  but  to  simplify  the 
calculus  approximate  the  expected  reduction  in  entropy  by  an  upper  bound 

which  they  proceed  to  use  to  select  the  next  experiment.  Neither  of 
these  approaches  is  asymptotically  optimal  except  in  special  "symmetric'' 
problems.  Ore  may  expect  the  Box-Hill  approach  to  fail  to  be  optimal 
because  it  is  oru y  ar.  approximation  to  the  method  proposed.  Apparently 
the  Lir.dley  approach,  which  seems  mere  reasonable,  fails  because  a 
myopic  one -stage -ahead  policy  caor.ot  be  depended  on  for  optimality  as 
was  seer;  in  the  illustration  of  the  optimal  gradient,  method. 
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On  the  other  hand  Meet er,  Pirie  and  Blot  (H2  ]  carried  out  tea* 
extensive  Monte  Carlo  experimentation*  on  two  problems.  These  were 
to  select  the  single  odd  coin  in  a  group  of  k  coins  and  to  identify 
three  normal  populations  with  common  variance  if  the  values  of  the 
three  means  are  specified  bat  the  appropriate  order  is  not  known. 

In  both  of  these  the  Bcx-Hlll  approach  did  better  than  the  Cbemoff 
approach  for  sample  sizes  that  were  limited  by  a  stepping  role  which 
led  roughly  to  error  probabilities  of  .05  .  Apparently  the  diffi¬ 
culty  with  the  asymptotically  efficient  approach  of  Chernoff  was  that 
initial  experimentation  has  a  potential  for  concentrating  on  non- 
Informative  experiments  which  sterns  to  shew  up  in  these  examples. 

Blot  and  Meeter  [B^J  subsequently  attempted  to  develop  an  alternative 
which  would  be  asymptotically  optimal  and  effective  in  the  early  stages. 
Their  method  sterns  to  be  effective  an  a  special  class  cf  problems. 

At  thus  time  the  maior  theoretical  problems  seem  to  be  the  prob¬ 
lem  cf  no -indifference  zone  ar.d  finding  effective  methods  of  experi¬ 
mental  cr.  at  the  early  stages  of  sampling.  For  the  problem  of  no- 
indifference  zone,  the  probl-«t  cf  deciding  the  sign  of  a  nermal  mean 
was  used  as  a  prototype  on  the  ground  that  its  solution  could  be  extended 
via  logarithm  cf  likelihood-ratio  to  more  general  situations.  Althotgh 
this  work  was  don*,  in  the  context  of  no  experimental  choice,  one  con¬ 
sequence  is  of  some  interest  here.  Consider  the  problem  where  the 
cost  cf  deciding  wrong  is  h(u)  and  the  cost  per  observation  is 
c  -♦  0  .  Then  using  Bayes  procedures  the  risk  for  non -sequential  pro- 
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eedures  is  R(u)  *  0(c  )  .  The  risk  fcr  the  optimal  sequential 

procedure  Is  R  (u)  *  0( c2/^)  .  Bald  and  Kelding  { Hl*2]  have  shown 
that  the  risk,  fbr  k- stage  stapling  procedures  satisfy  Bfe(u)  • 

C( c*( log  c)  7k  ;  where  yk  «  (2k«l)/(3  » 2k"^-l)  . 


In  Estimation 


As  a  preliminary  to  this  section  ve  mention  results  in  two  types 
of  problems.  For  sequentially  estimating  the  mean  of  a  normal  distri¬ 
bution  with  known  variance*  using  squared  error  loss  and  constant 
cost  per  observation*  the  optimal  sample  size  nQ  is  obtained  by 

a  n  a  e  /a 

minimizing  t-n  *  ker  n*  .  Thu*  n  is  (ko/c)  >  and  the  optimal 

c 

risk  is  2(c^)^2  .  If  the  variance  <r2  is  not  known*  an  approach 
suggested  by  Robbins  [R>  ]  consists  of  sampling  until  the  sample  sice 
n  exceeds  2  ar.d  the  csrrent  estimate  of  (kc^/c)^2  .  Thus  we  stop 
when  n  >  $  ar.d 


I  (X4-X)  <  ck*V(n-l)  . 

Results  of  Starr  ar.d  Wocirocfe  [87  J  indicate  that  the  difference  between 
the  optimal  risk  ar.d  that  fcr  this  procedure  is  0(c)*  i.e,*  the  cost 

of  not  knowing  the  nuisance  parameter  a  is  equivalent  to  that  if 
a  finite  number  of  observation.  (This  cost  is  sbeut  the  cost  of  one 
observation  unless  c  is  extremely  small*  in  which  case  these  obser¬ 
vations  are  excessive  )  Alvo  lA2J  has  attained  precise  bounds  in  a 
Bayesian  context.  The  point  of  ‘his  discussion  is  that  in  estimating* 
one  esn  expect  to  dc  very  well  using  rather  simple  Ideas.  That  is* 
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it  is  easy  to  find  procedures  \oich  achieve  risks  which  are 
2(cka2)1/2  +  0(c)  where  the  first  term  would  he  optimal  when  cr2 
is  known.  A  nontrivial  notion  of  asymptotic  optimality  must  attempt 
to  minimize  the  0(c)  term.  On  the  other  hand,  the  practical  use  for 
such  a  nontrivial  optimality  may  not  be  great. 

A  second  result  concerns  one-armed  bandit  problems.  This  may  be 
stated  as  follows.  Let  X^Xg, . . .  be  independent  observations  on 
a  random  variable  X  .  A  player  who  plays  n  <  N  times  collects 
X1-*-X2+ •  •  - +Xn  whose  expectation  is  nE(X)  .  Determine  n  sequen¬ 
tially  to  maximize  the  expected  payoff  which  is  E(n)E(X)  .  If 
E(X)  >  0,  it  pays  to  play  N  times,  and  the  expected  payoff  is 
NE(X)  .  If  E(X)  <  0,  it  pays  not  to  play.  Chernoff  and  Ray  [c6] 

i 

have  given  a  characterization  for  the  solution  bf  the  normal  version 

2 

where  the  X.^  are  normal  with  unknown  mean  p  and  known  ^variance  c  , 
and  a  has  a  specified  normal  prior  distribution,  and  N  is  large. 

Here  it  is  shown  that  the  expected  loss  due  to  ignorance  of  the  sign 
of  u  is  of  the  order  of  magnitude  of  (log  N)  •.  One  may  conjecture 
that  the  two -armed  bandit  problem  would  share  this  property. 

A  number  of  papers  in  optimal  design  approach  the  sequential  estima¬ 
tion  problem  from  a  myopic  iterative  point  of  view  without  much  atten¬ 
tion  to  shopping  rules  [b8,  f6,  E2,  S5,  S6].  For  example,  consider 
the  normal,  linear  regression  problem  with 
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where  u  is  normal  with  mean  0  and  constant  variance  1,  and 
where  x  may  be  selected  from  some  compact  set  S  .  The  covariance 
matrix  of  the  estimates  of  6  based  on  n  observations  corresponding 
to  x^Xg,  ...,xn  is 


‘  n  1  -1 
=  E  x .  x ' 

n  U--a  iiJ 


st 


One  approach  is  to  select  the  (n+l)  experiment,  i.e.,  *n+i>  t0 

minimize  the  generalized  variance,  |l  ^  •  Since 

r-l  _  *'~1  +  T 
^n+1  “  ^n  ^n+1 


where  J 
,st 


n+l  "  n+l  n+.l 


is  the  Fisher  information  contributed  by  the 


n+l  observation  and  is  of  rank  one,  the  matrix  identity 


(A+xx* )_1 


I  - 


A  ^xx ' 
l+x'A^x 


facilitates  the  minimizing  calculation.  The  Iteration  involved  is 
independent  of  the  actual  data  observed  and  is  also  used  to  calculate 
fixed  sample  size  designs  which  minimize  the  generalized  variance. 

See  also  [M3 J •  Minor  variations  of  this  basic  idea  apply  Bayesian 
notions  and  can  be  used  in  nonlinear  problems. 

This  approach  has  two  shortcomings.  First,  the  emphasis  on  the 
criterion  of  generalized  variance  is  deplorable.  While  the  criterion 
of  minimizing  the  generalized  variance  has  the  aesthetic  property  of 
leading  to  Invariance  of  optimality  under  linear  transformations  of 
the  parameter  space,  this  elegant-  mathematical  property  simply  dis- 
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guises  the  underlying  feet  that  the  criterion  has  no  basic  statistical 
justification  and  simply  delegates  the  scientist's  responsibility 
of  selecting  the  criterion  to  the  vagaries  of  the  mathematical  r;ruc- 
ture  of  the  problem.  Thus  in  a  probit  model  where  one  is  primarily 
Interested  in  the  LD5  and  only  slightly  interested  in  the  LD50, 
the  use  of  the  generalized  variance  criterion  leads  to  an  efficiency 
of  as  little  as  .56. 

It  is  true  that  in  the  linear  regression  problems  where  one  is  con¬ 
cerned  with  all  the  unknown  parameters,  the  design  which  minimizes 
the  generalized  variance  also  minimizes  the  maximum  variance  of  the 
estimated  regression  for  all  x  c  S  [  K3] .  However,  this  min-max 
optimality  interpretation  for  interpolation  disappears  when  one  is 
concerned  with  a  subset  consisting  of  several  but  not  all  of  the 
unknown  parameters. 

This  criticism  of  the  use  of  generalized  variance  (i.e.,  D-optim- 
ality)  does  not  invalidate  the  general  idea  of  the  myopic  iteration, 
which  can  also  be  applied  to  other  criteria.  However,  the  second 
shortcoming  is  that  any  asymptotic  optimality  obtained  is  basically 
the  cheap  one  which  any  locally  optimal  design  attains.  What  would 
be  more  interesting  is  a  demonstration  of  a  more  sensitive  optimality 
of  the  sort  suggested  in  our  discussion  of  the  Robbins,  Starr,  Wood- 
roofe,  Alvo  results.  But  once  again  it  is  far  from  clear  that  a  myopic 
policy  will  be  successful  in  this  more  delicate  task.  On  the  other 
hand,  one  may  argue  that  this  task  is  more  of  academic  than  practical 
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value.  Once  again  the  issue  centers  about  what  constitutes  effective 
procedures  of  cumulating  information  rapidly  in  the  early  stages  of 
sampling  and  hew  important  are  these  early  stages .  I  found  little 
discussion  in  the  literature  which  was  relevant  to  this  problem.  An 
exception  consists  of  a  paper  by  myself  [C3 3  and  one  by  Mallik  [MI] 
which  combine  the  ideas  of  the  bandit  problems  and  the  Bobbins  approach 
to  sequential  estimation.  I  believe  that  these  point  in  the  correct 
direction  to  assess  appropriate  orders  of  magnitude,  and  a  brief  dis¬ 
cussion  fellows. 

The  two-armed  bandit  was  dismissed  early  in  thiB  paper  as  a  failure 
as  a  prototype  example  to  clarify  the  problems  of  sequential  design 
of  experiments.  I  now  propose  to  disinter  it  as  a  problem  of  theoret¬ 
ical  relevance  by  considering  it  in  a  new  context.  Incidentally  some 
theoretical  insights  have  been  contributed  by  Gittins  and  Jones  [Gl], 
to  whom  we  referred  earlier,  and  to  Vogel  [yj]  and  Fabius  Von  Zwet 
[F4]  who  studied  minimax  solutions. 

Suppose  that  there  are  two  instruments  which  can  be  used  to  measure 

a  parameter  p,  but  it  isn't  known  which  is  more  accurate.  How  should 

one  select  between  the  two  instruments,  and  when  ought  one  to  stop 

sampling?  More  specifically,  suppose  X  is  normally  distributed 

2 

with  unknown  mean  p  and  variance  cr^  and  Y  is  normally  distributed 

2 

with  mean  p  and  variance  .  The  cost  of  sempling  is  c  per  unit 
observation  where  c  -»  0  .  The  cost  of  estimating  incorrectly  is 
kfp-p)^,  where  p  is  the  estimate  of  p  .  In  one  version  of  this 
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problem  o^  and  c„  are  bo-fch  unknown.  In  another  we  know 

p 

but  o-g  is  unknown.  Chernoff  { C4]  approached  the  first  using  an 
approximation  to  the  solution  of  the  two-armed  bandit  problem. 

Mallik  [Ml]  attacked  the  other  by  using  the  solution  of  the  one- 
armed  bandit  problem.  Let  us  consider  this  simpler  case. 

p 

While  Cg  is  unknown,  it  makes  sense  to  take  observations  on  Y, 
simultaneously  obtaining  information  on  p  and  an  estimate  of  <Tg  . 

One  continues  until  the  Robbins -type  procedure  suggests  stopping, 
or  until  the  evidence  indicates  that  cg  >  cr^,  in  which  case  one 
estimates  how  many  additional  observations  from  X  are  advisable 
before  terminating  the  sampling  process.  A  careful  computation  shows 
that  if  cr1  <  c0,  the  loss  attributed  to  taking  n  observations 
from  Y  before  switching  is  roughly  proportional  to  n(°2’°l)  * 

If  cr.,  >  o  ,  the  appropriate  number  of  observations  is  nQ  = 

(co  /k)  '  on  Y  and  a  decision  to  switch  tc  X  after  n  obser¬ 
vations  leads  to  a  loss  of  (n c~n)(  cr^-cr^ )  .  But  in  our  one-armed 
bandit  problem  the  expected  loss  due  to  taking  n  observations  when 
[x  <  0  was  -np,  whereas  the  expected  loss  due  to  taking  n  obser¬ 
vations  when  p  >  0  was  (N-n)p  .  Relating  N  and  p  to  n  and 

o 

o^-Gg  suggests  Mallik 's  procedure  of  applying  the  solution  of  the 
one-armed  bandit  to  decide  when  to  switch  to  X  . 

Monte  Carlo  simulations  suggest  that  this  method  yields  a  highly 
efficient  design  for  sequential  experimentation.  Theoretical  consider¬ 
ations,  supported  only  partly  by  the  Monte  Carlo  simulations,  indicate 
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that  while  losses  due  tc  error  of  estimation  are  of  the  order  of 

1/2 

magnitude  of  C^,  the  loss  attributed  to  lack  of  knowledge  of 

g 

Og  is  of  the  order  of  magnitude  of  c(log  c)  •  This  it  slightly 
larger  than  the  magnitude  0(c)  achieved  in  the  nondesign  problem 
of  Robbins,  Starr,  Woe droofe,  and  Aivo. 

6.  Applications 

The  ideas  cf  sequential  experimentation  appear  in  one  form  or 
another  in  a  variety  of  fields  of  application.  8oae  of  the  moat 
important  ones  have  extensive  literatures,  and  we  barely  mention 
these.  In  particular,  survey  sampling  is  one  field  where  double 
sampling  and  several -stage  sampling  have  an  extensive  history. 

Indeed  the  origin  of  oequential  analysis  can  be  traced  back  to  the 
double  sampling  inspection  scheme  of  Dodge  and  Romig  [D5).  In 
very  few  of  these  fields  has  a  serious  attempt  been  made  to  explore 
optimal! *v  from  a  fundamental  point  of  view.  Typically  an  ad  hoc 
class  of  procedures  has  beer  proposed,  and  sometimes  the  best  among 
these  is  characterized.  Seldom  does  one.  attempt  to  compare  these 
with  som?  more  generally  optimal  procedure.  Thus  one  is  often  in 
the  dark  at out  the  limits  of  further  possible  improvements. 

7 .  Mult  1  -I/r^el  Continue  us  Sampling  Inspection 

An  early  form  of  sequential  experimentation  was  in  the  multi¬ 
level  inspection  schemes  of  Dodge  [DM.  Lieberman  and  Solomon  [LI] 
rephrased  some  previous  ambitious  optimization  problems  to  formulate 
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a  simpler  but  highly  relevant  problem.  Imagine  a  continuous  pro¬ 
duction  process  yielding  many  items  which  can  be  inspected.  As  the 
items  bass  by.,  they  are  inspected  with  one  of  several  available 
probabilities  p  =  1  >  pg  >  • • •  >  p^  .  If  a  defect  is  found,  the 
rate  of  inspection  is  increased.  If  n^  successive  non-defects 
are  found  while  sampling  at.  level  X.^,  the  rate  of  inspection  is 
reduced.  When  the  production  process  turns  out  items  which  are  defect¬ 
ive  independently  with  constant  probability,  the  "state"  of  the 
inspection  system  describes  a  simple  stationary  Markov  process  whose 
limiting  characteristics  are  easily  evaluated.  Thus  one  can  com¬ 
pute  the  costs  and  gains  of  this  multi-level  inspection  scheme  for 
each  p  .  One  can  easily  maintain  a  minimum  level  of  quality  of 
output.  When  the  production  process  goes  out  of  control,  this  system 
seems  to  respond  sensibly.  There  is  one  major  aspect  in  which  the 
Lieberman-Sclomon  problem  differs  from  the  class  of  problems  with 
which  we  have  previously  been  concerned  in  this  paper.  Those  involved 
termination  in  a  finite  tame.  This  process  is  stationary  and  should 
be  thought  of  as  going  on  indefinitely.  Indeed  this  paper  initiated 
a  good  deal  of  subsequent  research  in  Markov  decision  problems  and 
constituted  an  early  form  of  stochastic  control. 

8.  Largest  of  k -Means 

As  initially  formulated  [Bl]  this  problem  specifies  k  normal 

populations  IL,  i=l,2, ...,k  with  means  ^  and  common  known  variance 
2 

a  .  The  object  is  to  decide  after  n  observations  on  each  popula- 
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tion  which  bu  the  largest  mean.  The  natural  procedure  it  to  telect 
the  population  corresponding  to  tfcv  largest  sample  Man.  The  sample 
else  n  it  selected  to  at  tc  assure  that  the  probability  of  correct 
selection  attains  at  least  a  given  -value  1-a  if  the  largest  popula¬ 
tion  mean  is  at  leatt  &  greater  than  each  of  the  others.  Bare  a 

o 

and  6  are  specified  and  n  is  computed  as  a  function  of  k,  or  , 
a,  and  6  .  Thie  centputat3.cn  is  relatively  trivial  since  there  is 
a  "least -favorable"  configuration  of  means  where 

a  6  and  *  **•  *  **k  ~  ®  * 

The  problem  of  sequential  experimentation  appears  when  one  may 
decide  to  proceed  sequentially.  Bessie-  [  ffi]  applied  the  theory  of 
Part  XT  to  obtain  a  procedure  which  is  asymptotically  optimal  if  one 
can  assume  that  the  largest  mean  exceeds  all  the  others  by  at  least 
a  fixt'd  amount. 


Ttaa  reg-Jv  seems  ’ c  be*r.  ignored  by  subsequent  workers  in 
the  feel  a  vfcr.  applied  sequenMai  s.-r-nk,*  where  each  population  is 
sample!  equally  often.  8uos*q -.er.*Iy  Gutman  { G5 j  and  Paulson  ( Pi] 
develop* d  some  alternative  aul’i-stag*  procedures  where  the  results 
of  each  stage  were  used  to  discari  seme  papule* i  jr.fl  from  further 
consideration.  Alternative  me* hods  nave  re«-n  d- veioped  by  D.  Hoel 
[H?';  ar.d  J.W.H.  Swanepc-el  and  r.  v«\<  [S7; 

There  has  Deen  ar  txf * ■  sj  /*■  i.. » -ra* v;e  e<t*-ding  this  problem 
to  o*- her  distributions  and  other  par  one’  -rs .  The  variation  of  the 
two -armed  bandit  problem.  where  the  payoff  c-'urs  only  after  t.he  last 
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observation  and  the  experimenter  decides  which  is  the  better  arm, 
is  also  an  example  of  this  type  of  problem.  The  largest  of  k 
populations  problem  corresponds  to  a  K -armed  bandit  problem  subject 
to  two  variations.  The  total  sample  site  is  not  necessarily  fixed. 
Also,  in  dealing  with  the  k- armed  bandit  problem  one  does  not  typically 
apply  the  rather  artificial  criterion  of  maintaining  a  minimum  proba¬ 
bility  of  correct  selection  at  configurations  of  parameters  where 
the  largest  exceeds  the  others  by  at  least  a  specified  &  . 

Several  variations  of  the  two-armed  bandit  problem  occur  in  appli¬ 
cation  contexts.  In  connection  with  medical  trials  where  the  arms 
refer  to  treatments,  various  investigators  [R2,  Zl,  Sfc,  C9,  Ff,  F8J 
have  investigated  Play  the  Winner  Rules, which  continue  the  use  of  a 
treatment  as  long  as  it  is  successful  and  switch  when  it  fails,  as 
well  as  other  "adaptive"  methods.  These  rules  can  apply  in  problems 
with  an  infinite  horizon  of  patients  to  be  treated.  On  the  other  hand, 
one-armed  bandit  variations  applied  to  medical  trials  were  discussed 
by  Chernoff  [ C J* ] ,  Colton  [C8],  and  Ans combe  [A4],  In  Colton’s  version 
drugs  are  tried  alternately  until  there  is  an  illicit  decision  that 
one  is  better  and  the  remainder  of  a  horizon  of  N  patients  are 
treated  with  the  drug  that  is  considered  better.  The  one-armed  bandit 
problem  comes  up  naturally  in  a  rectified  sampling  inspection  problem 
too  [  J. 

Finally,  Heilman  and  Cover  [H3]  have  exploited  randomization  in 
a  finite  memory  two-armed  bandit  problem  where  the  observer  ij  res- 


tricted  to  knowing  only  the  current  sample  size  n  and  the  value 
of  a  k-valued  function  of  the  past. 

9.  Screening  Experiments 

In  pharmaceutical  research  where  one  seeks  drugs  which  have  anti- 
disease  activity,  one  must  screen  many  possible  candidate  chemical 
formulations  by  testing  them  first,  on  animals.  It-  is  important  to 
devise  a  system  wnere  many  drugs  are  tested  and  quickly  discarded 
(because  of  the  expense  of  testing)  unless  they  show  indications  of 
activity.  In  that  case  they  are  retested  more  thoroughly.  This 
procedure  passes  each  drug  through  several  screening  experiments, 
each  more  elaborate  than  the  preceding.  If  the  drug  passes  all  of 
these,  it  is  regarded  as  a  candidate  for  further  research  and  testing 
on  humans.  (See  [IT/,  R5J). 

10.  Group  Testing 

During  World  War  II  it  was  noted  by  Dorfman  [D6]  that  the  cost 
of  testing  blood  specimens  of  individuals  for  the  presence  of  a  mod¬ 
erately  rare  disease  could  be  reduced  considerably  by  combining  the 
samples  of  many  individuals.  If  the  combined  sample  showed  no  sign 
of  disease,  the  entire  group  was  passed  at  the  cost  of  one  test.  If 
the  combined  sample  shows  signs  of  disease,  the  individual  specimens 
could  be  tested  separately.  With  appropriate  grouping  depending 
on  the  overall  frequency  of  disease,  this  system  and  improvements 
produced  considerable  savings.  Thus  subject  is  elaborated  upon  by 


Sobel  and  Nebenzahl  [S3  3  who  contributed  a  thorough  bibliography. 

11.  Search  Problems 

Search  problems  have  appeared  in  a  variety  of  contexts  and  appli¬ 
cations  .  They  deal  with  the  problem  of  locating  an  item  which  may 
be  in  any  one  (or  sometimes  possibly  none)  cf  k  location,  each 
of  which  may  be  searched  and  yield  the  tine,  if  it  is  there,  with  a 
specified  probability.  Often  these  problems  are  treated  as  combina¬ 
torial  problems  and  k  is  large.  No  attempt  will  be  made  to  elaborate 
on  the  topic,  which  has  an  extensive  literature  which  was  surveyed 
by  Enslov  [E3 },  and  some  further  references  are  given  by  Sweat  [S10] . 

A  different  approach  is  given  by  Lipster  and  Shiryaev  [13],  who  use 
diffusion  approximations  for  a  variation  of  the  search  problem  where 
k  is  not  large. 

12.  Control  Theory 

Multi-level  sampling  inspection  is  one  form  of  control  applied 
to  maintain  the  quality  of  a  continuous  productior  line.  Box  and 
Jenkins  [B9,  BIO]  have  considered  the  problem  of  monitoring  a  complex 
chemical  production  process  where  slow  changes  in  the  underlying 
environment  may  require  adjustment  of  inputs  to  maintain  optimality. 
They  suggest  perturbing  the  inputs  off  the  position  that  seems  optimal, 
to  detect  and  estimate  possible  changes  in  the  response  surface  by 
measuring  the  efficiency  of  the  system.  In  this  way  the  estimate  of 
the  current  optimum  is  continuously  updated.  The  price  of  this  is 
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the  lose  of  efficiency  involved  in  perturbing  the  system  to  measure 
the  response  surface.  If  the  perturbation  is  too  small,  the  surface 
and  possible  changes  in  it  are  not  measured  precisely  enough.  If 
the  perturbations  are  too  large,,  the  experimentation  reduces  the 
efficiency  of  the  system.  This  type  of  system  may  be  thought  of 
as  a  stationary  control  problem , 

1J.  Forcing  Experiments  to  be  Balanced 

In  clinical  trials  as  well  as  in  many  other  scientific  investi¬ 
gations,  the  need  to  avoid  bias  requires  experimentation  where  the 
parties  involved  do  not  know  whether  they  are  receiving  a  treatment 
or  a  control.  Thus  assignments  may  be  made  by  using  a  fair  coin, 
but  in  small-sized  experiments  this  may  result  in  a  severe  imbalance. 
Blackwell  and  Hedges  (  KJ  arid  Efron  {Eli  have  considered  alternative 
schemes  to  complex  randomization  tc  avoid  several  kinds  of  bias, 
e.g.,  selection  bias  and  experimental  bias.  One  scheme  considered 
is  to  assign  tne  treatment  with  probability  p  if  the  treatment 
has  been  used  more  often  than  the  control  and  (l**p)  if  the  control 
has  been  used  more  cf*er..  Efron  indicates  a  preference  for  p  -  2fj> 
and  compare';  the  balancing  properties  cf  this  and  other  schemes  as 
well  as  +he  potentialities  for  selection  bias  and  experimental  bias. 

l4.  Miscellani  ;  s 

Problems  cf  information  storage  and  retrieval  and  error-correcting 
codes  involve  notions  of  sequential  experimentation  in  a  fashion  which 


r 


does  not  fit  traditional  approaches  of  statistics  very  veil.  Never¬ 
theless,  these  problem*  have  fundamental  statistical  aspects. 

In  clinical  problems  and  control  problems  there  are  classes  of 
problems  where  the  response  to  an  experiment  is  not  observed  immed¬ 
iately  and  seme  theory  is  required  to  deal  with  delayed  observations 
[E2,  S8]. 

A  useful  bibliography  on  design  of  experiments  is  given  by 
Herzberg  and  Cox  [H4J. 
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